168 research outputs found
Test: WebView-Oriented Testing for Android Applications
WebView is a UI widget that helps integrate web applications into the native
context of Android apps. It provides powerful mechanisms for bi-directional
interactions between the native-end (Java) and the web-end (JavaScript) of an
Android app. However, these interaction mechanisms are complicated and have
induced various types of bugs. To mitigate the problem, various techniques have
been proposed to detect WebView-induced bugs via dynamic analysis, which
heavily relies on executing tests to explore WebView behaviors. Unfortunately,
these techniques either require manual effort or adopt random test generation
approaches, which are not able to effectively explore diverse WebView
behaviors. In this paper, we study the problem of test generation for WebViews
in Android apps. Effective test generation for WebViews requires identifying
the essential program properties to be covered by the generated tests. To this
end, we propose WebView-specific properties to characterize WebView behaviors,
and devise a cross-language dynamic analysis method to identify these
properties. We develop Test, a test generation technique that searches
for event sequences covering the identified WebView-specific properties. An
evaluation on 74 real-world open-/closed-source Android apps shows that
Test can cover diverse WebView behaviors and detect WebView-induced
bugs effectively. Test detected 36 previously-unknown bugs. From the 22
bugs that we have reported to the app developers, 13 bugs were confirmed, 9 of
which were fixed.Comment: Accepted by the 32nd ACM SIGSOFT International Symposium on Software
Testing and Analysis (ISSTA 2023
Fuzzing Deep Learning Compilers with HirGen
Deep Learning (DL) compilers are widely adopted to optimize advanced DL
models for efficient deployment on diverse hardware. Their quality has profound
effect on the quality of compiled DL models. A recent bug study shows that the
optimization of high-level intermediate representation (IR) is the most
error-prone compilation stage. Bugs in this stage are accountable for 44.92% of
the whole collected ones. However, existing testing techniques do not consider
high-level optimization related features (e.g. high-level IR), and are
therefore weak in exposing bugs at this stage. To bridge this gap, we propose
HirGen, an automated testing technique that aims to effectively expose coding
mistakes in the optimization of high-level IR. The design of HirGen includes 1)
three coverage criteria to generate diverse and valid computational graphs; 2)
full use of high-level IRs language features to generate diverse IRs; 3) three
test oracles inspired from both differential testing and metamorphic testing.
HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs
confirmed and 12 fixed. Further, we construct four baselines using the
state-of-the-art DL compiler fuzzers that can cover the high-level optimization
stage. Our experiment results show that HirGen can detect 10 crashes and
inconsistencies that cannot be detected by the baselines in 48 hours. We
further validate the usefulness of our proposed coverage criteria and test
oracles in evaluation
Programming by Example Made Easy
Programming by example (PBE) is an emerging programming paradigm that
automatically synthesizes programs specified by user-provided input-output
examples. Despite the convenience for end-users, implementing PBE tools often
requires strong expertise in programming language and synthesis algorithms.
Such a level of knowledge is uncommon among software developers. It greatly
limits the broad adoption of PBE by the industry. To facilitate the adoption of
PBE techniques, we propose a PBE framework called Bee, which leverages an
"entity-action" model based on relational tables to ease PBE development for a
wide but restrained range of domains. Implementing PBE tools with Bee only
requires adapting domain-specific data entities and user actions to tables,
with no need to design a domain-specific language or an efficient synthesis
algorithm. The synthesis algorithm of Bee exploits bidirectional searching and
constraint-solving techniques to address the challenge of value computation
nested in table transformation. We evaluated Bee's effectiveness on 64 PBE
tasks from three different domains and usability with a human study of 12
participants. Evaluation results show that Bee is easier to learn and use than
the state-of-the-art PBE framework, and the bidirectional algorithm achieves
comparable performance to domain-specifically optimized synthesizers.Comment: Accepted by ACM Transactions on Software Engineering and Methodolog
MEMO: Coverage-guided Model Generation For Deep Learning Library Testing
Recent deep learning (DL) applications are mostly built on top of DL
libraries. The quality assurance of these libraries is critical to the
dependable deployment of DL applications. A few techniques have thereby been
proposed to test DL libraries by generating DL models as test inputs. Then
these techniques feed those DL models to DL libraries for making inferences, in
order to exercise DL libraries modules related to a DL model's execution.
However, the test effectiveness of these techniques is constrained by the
diversity of generated DL models. Our investigation finds that these techniques
can cover at most 11.7% of layer pairs (i.e., call sequence between two layer
APIs) and 55.8% of layer parameters (e.g., "padding" in Conv2D). As a result,
we find that many bugs arising from specific layer pairs and parameters can be
missed by existing techniques.
In view of the limitations of existing DL library testing techniques, we
propose MEMO to efficiently generate diverse DL models by exploring layer
types, layer pairs, and layer parameters. MEMO: (1) designs an initial model
reduction technique to boost test efficiency without compromising model
diversity; and (2) designs a set of mutation operators for a customized Markov
Chain Monte Carlo (MCMC) algorithm to explore new layer types, layer pairs, and
layer parameters. We evaluate MEMO on seven popular DL libraries, including
four for model execution (TensorFlow, PyTorch and MXNet, and ONNX) and three
for model conversions (Keras-MXNet, TF2ONNX, ONNX2PyTorch). The evaluation
result shows that MEMO outperforms recent works by covering 10.3% more layer
pairs, 15.3% more layer parameters, and 2.3% library branches. Moreover, MEMO
detects 29 new bugs in the latest version of DL libraries, with 17 of them
confirmed by DL library developers, and 5 of those confirmed bugs have been
fixed.Comment: 11 pages, 8 figure
Towards Modeling Software Quality of Virtual Reality Applications from Users' Perspectives
Virtual Reality (VR) technology has become increasingly popular in recent
years as a key enabler of the Metaverse. VR applications have unique
characteristics, including the revolutionized human-computer interaction
mechanisms, that distinguish them from traditional software. Hence, user
expectations for the software quality of VR applications diverge from those for
traditional software. Investigating these quality expectations is crucial for
the effective development and maintenance of VR applications, which remains an
under-explored area in prior research.
To bridge the gap, we conduct the first large-scale empirical study to model
the software quality of VR applications from users' perspectives. To this end,
we analyze 1,132,056 user reviews of 14,150 VR applications across seven app
stores through a semiautomatic review mining approach. We construct a taxonomy
of 12 software quality attributes that are of major concern to VR users. Our
analysis reveals that the VR-specific quality attributes are of utmost
importance to users, which are closely related to the most unique properties of
VR applications like revolutionized interaction mechanisms and immersive
experiences. Our examination of relevant user complaints reveals the major
factors impacting user satisfaction with VR-specific quality attributes. We
identify that poor design or implementation of the movement mechanisms, control
mechanisms, multimedia systems, and physics, can significantly degrade the user
experience. Moreover, we discuss the implications of VR quality assurance for
both developers and researchers to shed light on future work. For instance, we
suggest developers implement sufficient accessibility and comfort options for
users with mobility limitations, sensory impairments, and other specific needs
to customize the interaction mechanisms. Our datasets and results will be
released to facilitate follow-up studies
- …